Bouar
A robust synthetic data generation framework for machine learning in High-Resolution Transmission Electron Microscopy (HRTEM)
DaCosta, Luis Rangel, Sytwu, Katherine, Groschner, Catherine, Scott, Mary
Machine learning techniques are attractive options for developing highly-accurate automated analysis tools for nanomaterials characterization, including high-resolution transmission electron microscopy (HRTEM). However, successfully implementing such machine learning tools can be difficult due to the challenges in procuring sufficiently large, high-quality training datasets from experiments. In this work, we introduce Construction Zone, a Python package for rapidly generating complex nanoscale atomic structures, and develop an end-to-end workflow for creating large simulated databases for training neural networks. Construction Zone enables fast, systematic sampling of realistic nanomaterial structures, and can be used as a random structure generator for simulated databases, which is important for generating large, diverse synthetic datasets. Using HRTEM imaging as an example, we train a series of neural networks on various subsets of our simulated databases to segment nanoparticles and holistically study the data curation process to understand how various aspects of the curated simulated data -- including simulation fidelity, the distribution of atomic structures, and the distribution of imaging conditions -- affect model performance across several experimental benchmarks. Using our results, we are able to achieve state-of-the-art segmentation performance on experimental HRTEM images of nanoparticles from several experimental benchmarks and, further, we discuss robust strategies for consistently achieving high performance with machine learning in experimental settings using purely synthetic data.
Leveraging generative adversarial networks to create realistic scanning transmission electron microscopy images
Khan, Abid, Lee, Chia-Hao, Huang, Pinshane Y., Clark, Bryan K.
The rise of automation and machine learning (ML) in electron microscopy has the potential to revolutionize materials research through autonomous data collection and processing. A significant challenge lies in developing ML models that rapidly generalize to large data sets under varying experimental conditions. We address this by employing a cycle generative adversarial network (CycleGAN) with a reciprocal space discriminator, which augments simulated data with realistic spatial frequency information. This allows the CycleGAN to generate images nearly indistinguishable from real data and provide labels for ML applications. We showcase our approach by training a fully convolutional network (FCN) to identify single atom defects in a 4.5 million atom data set, collected using automated acquisition in an aberration-corrected scanning transmission electron microscope (STEM). Our method produces adaptable FCNs that can adjust to dynamically changing experimental variables with minimal intervention, marking a crucial step towards fully autonomous harnessing of microscopy big data.